Skip to content

Conversation

remmen-io
Copy link

Dynamic Interval

Add a dynamic interval feature that automatically adjusts the time between pod terminations based on the number of candidate pods in your cluster. This helps ensure appropriate chaos levels in both small and large environments.

How it works

With dynamic interval enabled, chaoskube will calculate the interval between pod terminations using the following formula:

interval = totalWorkingMinutes / (podCount  * factor)

Where:

  • totalWorkingMinutes = 10 days * 8 hours * 60 minutes = 4800 minutes (we assume that all pods should be killed during 2 work weeks)
  • factor is the configurable dynamic interval factor

The dynamic interval factor lets you control the aggressiveness of the terminations:

  • With factor = 1.0: Standard interval calculation
  • With factor > 1.0: More aggressive terminations (shorter intervals)
  • With factor < 1.0: Less aggressive terminations (longer intervals)

Example scenarios

  • Small cluster (100 pods, factor 1.0): interval = 48 minutes
  • Small cluster (100 pods, factor 1.5): interval = 32 minutes
  • Small cluster (100 pods, factor 2.0): interval = 24 minutes
  • Large cluster (1500 pods, factor 1.0): interval = 3 minutes


pods = filterByAnnotations(pods, c.Annotations)

podCount := len(pods)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of finding all possible target pods (after filtering) should pretty much match the logic of Candidates() (https://github.com/linki/chaoskube/blob/master/chaoskube/chaoskube.go#L214). Let's try to re-use it.

Copy link
Author

@remmen-io remmen-io Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Candidates() filters out too much, which gives a false number to calculate the dynamic interval.
For instance, filterByMinimumAge or filterByOwnerReference, which do not make sense for the calculation.

Therefore I recreated the list by filtering only the relevant pods to calculate the interval

pods = filterByOwnerReference(pods)
c.Logger.WithFields(log.Fields{
"count": len(pods),
}).Debug("Final pod count after owner reference filtering")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could think about moving the logging to the end of each filterBy* function instead of having it here.

@remmen-io
Copy link
Author

Closing this PR to retarget it to a frozen branch.
See new PR #666 which targets a stable reference branch.

@remmen-io remmen-io closed this Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants